Perplexity of n-Gram and Dependency Language Models

نویسندگان

Martin Popel

David Marecek

چکیده

Language models (LMs) are essential components of many applications such as speech recognition or machine translation. LMs factorize the probability of a string of words into a product of P(wi |hi ), where hi is the context (history) of word wi . Most LMs use previous words as the context. The paper presents two alternative approaches: post-ngram LMs (which use following words as context) and dependency LMs (which exploit dependency structure of a sentence and can use e.g. the governing word as context). Dependency LMs could be useful whenever a topology of a dependency tree is available, but its lexical labels are unknown, e.g. in tree-to-tree machine translation. In comparison with baseline interpolated trigram LM both of the approaches achieve significantly lower perplexity for all seven tested languages (Arabic, Catalan, Czech, English, Hungarian, Italian, Turkish).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generative Incremental Dependency Parsing with Neural Networks

We propose a neural network model for scalable generative transition-based dependency parsing. A probability distribution over both sentences and transition sequences is parameterised by a feedforward neural network. The model surpasses the accuracy and speed of previous generative dependency parsers, reaching 91.1% UAS. Perplexity results show a strong improvement over n-gram language models, ...

متن کامل

A Maximum Entropy Language Model with Topic Sensitive Features

We present a maximum entropy approach to topic sensitive language modeling. By classifying the training data into di erent parts according to topic, extracting topic sensitive unigram features, and combining these new features with conventional N-grams in language modeling, we build a topic sensitive bigram model. This model improves both perplexity and word error rate. keywords: maximum entrop...

متن کامل

Modelling and Optimizing on Syntactic N-Grams for Statistical Machine Translation

The role of language models in SMT is to promote fluent translation output, but traditional n-gram language models are unable to capture fluency phenomena between distant words, such as some morphological agreement phenomena, subcategorisation, and syntactic collocations with string-level gaps. Syntactic language models have the potential to fill this modelling gap. We propose a language model ...

متن کامل

Maximum Entropy Language Modeling with Non-Local and Syntactic Dependencies

Standard N -gram language models exploit information only from the immediate past to predict the future word. To improve the performance of a language model, two di erent kinds of long-range dependence, the syntactic structure and the topic of sentences are taken into consideration. The likelihood of many words varies greatly with the topic of discussion and topics capture this di erence. Synta...

متن کامل

Variable-length sequence language model for large vocabulary continuous dictation machine

In natural language, some sequences of words are very frequent. A classical language model, like n-gram, does not adequately take into account such sequences, because it underestimates their probabilities. A better approach consists in modeling word sequences as if they were individual dictionary elements. Sequences are considered as additional entries of the word lexicon, on which language mod...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Perplexity of n-Gram and Dependency Language Models

نویسندگان

چکیده

منابع مشابه

Generative Incremental Dependency Parsing with Neural Networks

A Maximum Entropy Language Model with Topic Sensitive Features

Modelling and Optimizing on Syntactic N-Grams for Statistical Machine Translation

Maximum Entropy Language Modeling with Non-Local and Syntactic Dependencies

Variable-length sequence language model for large vocabulary continuous dictation machine

عنوان ژورنال:

اشتراک گذاری